cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

luriveros
by New Contributor
  • 5479 Views
  • 1 replies
  • 0 kudos

implementing liquid clustering for DataFrames directly

 Hi !! I have a question is it possible to implementing liquid clustering for DataFrames directly saved to delta files (df.write.format("delta").save("path")), The conventional approach involving table creation

  • 5479 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi,Hopefully this question is related to testing and any production data would get persisted to a table but one example is:df = (spark.range(10).write.format("delta").mode("append").save("file:/tmp/data"))ALTER TABLE delta.`file:/tmp/data` CLUSTER BY...

  • 0 kudos
pshuk
by New Contributor III
  • 1327 Views
  • 2 replies
  • 0 kudos

file transfer through CLI to DBFS, working manually but not in python code...

Hi,I ran my code sucessfully in the past but suddenly it stopped working. I have a python code that transfer local files to DBFS location using CLI. When I run the command manually on the screen, it works but in the code, it gives me the error  "retu...

  • 1327 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

The 127 error code indicates “command not found”,Try using the full path of the databricks command

  • 0 kudos
1 More Replies
Ramakrishnan83
by New Contributor III
  • 2376 Views
  • 1 replies
  • 0 kudos

Optimize and Vaccum Command

Hi team,I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items1.Should I need to  run Optimize and Vacuum c...

  • 2376 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hkesharwani
Contributor II
  • 0 kudos

Hi Ramakrishnan83,1. Vacume commands only work with delta tables, Vacume command will delete the parquet files older than the retention period which is by default 7 days.  Optimize will rather club the files in case any special serial is provided.2. ...

  • 0 kudos
kazinahian
by New Contributor III
  • 2442 Views
  • 2 replies
  • 1 kudos

Resolved! How can I Learn Databricks Data Pipeline in Azure environment?

Hello Esteemed Community,I have a fundamental question to ask, and I approach it with a sense of humility. Your guidance in my learning journey would be greatly appreciated. I am eager to learn how to build a hands-on data pipeline within the Databri...

  • 2442 Views
  • 2 replies
  • 1 kudos
Latest Reply
Palash01
Valued Contributor
  • 1 kudos

Hey @kazinahian I completely understand your hesitation and appreciate your approach to seeking guidance! Embarking on a learning journey can be daunting, especially when financial considerations are involved. I'm happy to offer some advice on buildi...

  • 1 kudos
1 More Replies
Tripalink
by New Contributor III
  • 6328 Views
  • 4 replies
  • 1 kudos

Error. Another git operation is in progress.

I am getting an error every time I try to view another branch or create a branch. Sometimes this has happened in the past, but usually seems to fix itself after about 10-30 minutes. This error has been lasting for over 12 hours, so I am now concerned...

git_error_message
  • 6328 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hakuna_Madata
New Contributor II
  • 1 kudos

I had the same problem and I could resolve it by creating the repo again with a trailing ".git" in the Git repository URL.For example, use thishttps://gitlab.mycompany.com/my-project/my-repo.gitnot this:https://gitlab.mycompany.com/my-project/my-repo...

  • 1 kudos
3 More Replies
Arnold_Souza
by New Contributor III
  • 3420 Views
  • 3 replies
  • 0 kudos

Unable to enable entitlements to account groups in a workspace

Currently, I am both an account administrator and also a workspace administrator in Databricks.​When I try to enable the entitlements "Workspace access" and "Databricks SQL access" to account groups I am receiving the error "Failed to enable entitlem...

  • 3420 Views
  • 3 replies
  • 0 kudos
Latest Reply
saikumar246
Databricks Employee
  • 0 kudos

Hi @Arnold_Souza, The error "Failed to enable entitlement.: Group not found" that you're experiencing when trying to enable the entitlements “Workspace access” and “Databricks SQL access” for account groups is likely due to the fact that Identity Fed...

  • 0 kudos
2 More Replies
Martinitus
by New Contributor III
  • 6032 Views
  • 4 replies
  • 0 kudos

CSV Reader reads quoted fields inconsistently in last column

I just opened another issue: https://issues.apache.org/jira/browse/SPARK-46959It corrupts data even when read with mode="FAILFAST", i consider it critical, because basic stuff like this  should just work!

  • 6032 Views
  • 4 replies
  • 0 kudos
Latest Reply
Martinitus
New Contributor III
  • 0 kudos

either:  [ 'some text', 'some text"', 'some text"' ]alternatively: [ '"some text"', 'some text"', 'some text"' ]probably most sane behavior would be a parser error ( with mode="FAILFAST").just parsing garbage without warning the user is certainly not...

  • 0 kudos
3 More Replies
tomph
by New Contributor II
  • 2366 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundles - Manage existing jobs

Hello,we are starting to experiment with Databricks Asset Bundles, especially to keep jobs aligned between workspaces. Is there a way to start managing existing jobs, to avoid erasing previous runs history?Thank you,Tommaso

  • 2366 Views
  • 2 replies
  • 0 kudos
Latest Reply
tomph
New Contributor II
  • 0 kudos

Great news, thanks!

  • 0 kudos
1 More Replies
matt_stanford
by New Contributor III
  • 2612 Views
  • 1 replies
  • 0 kudos

Resolved! Type 2 SCD when using Auto Loader

Hi there! I'm pretty new to using Auto Loader, so this may be a really obvious fix, but it's stumped me for a few weeks, so I'm hoping someone can help! I have a small csv file saved in ADLS with a list of pizzas for an imaginary pizza restaurant. I'...

  • 2612 Views
  • 1 replies
  • 0 kudos
Latest Reply
matt_stanford
New Contributor III
  • 0 kudos

So, I figured out what the issue was. I needed to delete checkpoint folder. After I did this and re-ran the notebook, everything worked fine! 

  • 0 kudos
AlexPedurand
by New Contributor
  • 1625 Views
  • 1 replies
  • 0 kudos

SOAP API - Connection

HelloWe have a workflow in our team to perform usual monthly tasks to be ran on the first working day of the month.Each of the ~20 users will run a clone of this workflow most likely all around the same time but with different options. Because we don...

  • 1625 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

maybe you can set a lock before call SOAP APIpython - Using a Lock with redis-py - Stack Overflow

  • 0 kudos
data1233
by New Contributor
  • 1805 Views
  • 1 replies
  • 0 kudos

create an array sorted by a field

How do i create an array from a field while applying sorting?how do I do this in data brick since databricks does not support order by in array_agg?  same is possible in Snowflake and(Array agg) or Redshift(listagg). SELECT ARRAY_AGG(O_ORDERKEY) WITH...

  • 1805 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

%sql SELECT array_sort(array_agg(col) ,(left, right) -> CASE WHEN left < right THEN -1 WHEN left > right THEN 1 ELSE 0 END) arr_col FROM VALUES (3), (2), (1) AS tab(col); https://docs.databricks.com/en/sql/language-manual/functions/array_sort.h...

  • 0 kudos
jerryrard
by New Contributor
  • 1766 Views
  • 2 replies
  • 0 kudos

Python Databricks how to run all cells in another notebook except the last cell

I have a Python Databricks notebook which I want to call/run another Databricks notebook using dbutils.notebook.run()... but I want to run all the cells in the "called" notebook except the last one.Is there a way to do a count of cells in the called ...

  • 1766 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 In the alternative way, you can use dbutils.notebook.run to pass the parameters, and use dbutils.widgets.get in another notebook to get the parameter values,and determine the parameter values to decide whether to execute codes in the specified cellh...

  • 0 kudos
1 More Replies
Kai
by New Contributor II
  • 3041 Views
  • 1 replies
  • 0 kudos

Resolved! Differences Between "TEMPORARY STREAMING TABLE" and "TEMPORARY STREAMING LIVE VIEW" in DLT

Hello Databricks community,I'm seeking clarification on the distinctions between the following two syntaxes:CREATE OR REFRESH TEMPORARY STREAMING TABLECREATE TEMPORARY STREAMING LIVE VIEWAs of my understanding, both of these methods do not store data...

  • 3041 Views
  • 1 replies
  • 0 kudos
Latest Reply
gabsylvain
Databricks Employee
  • 0 kudos

Hi @Kai, The two syntaxes you're asking about, CREATE OR REFRESH TEMPORARY STREAMING TABLE and CREATE TEMPORARY STREAMING LIVE VIEW, are used in Delta Live Tables and have distinct purposes. CREATE OR REFRESH TEMPORARY STREAMING TABLE: This syntax i...

  • 0 kudos
RyHubb
by New Contributor III
  • 4638 Views
  • 5 replies
  • 0 kudos

Resolved! Databricks asset bundles job and pipeline

Hello, I'm looking to create a job which is linked to a delta live table.  Given the job code like this: my_job_name: name: thejobname schedule: quartz_cron_expression: 56 30 12 * * ? timezone_id: UTC pause_stat...

  • 4638 Views
  • 5 replies
  • 0 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 0 kudos

@RyHubb  You can specify the variable of the ID and it will be materialized at deploy time. No need to do this yourself. An example is at https://github.com/databricks/bundle-examples/blob/24678f538415ab936e341a04fce207dce91093a8/default_python/...

  • 0 kudos
4 More Replies
leaw
by New Contributor III
  • 6607 Views
  • 7 replies
  • 0 kudos

Resolved! How to load xml files with spark-xml ?

Hello,I cannot load xml files.First, I tried to install Maven library com.databricks:spark-xml_2.12:0.14.0 as it is told in documentation, but I could not find it. I only have HyukjinKwon:spark-xml:0.1.1-s_2.10, with this one I have this error: DRIVE...

  • 6607 Views
  • 7 replies
  • 0 kudos
Latest Reply
Frustrated_DE
New Contributor III
  • 0 kudos

Mismatch on Scala version, my bad! Sorted

  • 0 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels