cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

noimeta
by Contributor III
  • 12497 Views
  • 14 replies
  • 12 kudos

Resolved! Error when create an external location using code

I'm trying to create an external location from notebook, and I got this kind of error[PARSE_SYNTAX_ERROR] Syntax error at or near 'LOCATION'(line 1, pos 16)   == SQL == CREATE EXTERNAL LOCATION IF NOT EXISTS test_location URL 's3://test-bronze/db/tes...

  • 12497 Views
  • 14 replies
  • 12 kudos
Latest Reply
Lokeshv
New Contributor II
  • 12 kudos

Hey everyone,I'm facing an issue with retrieving data from a volume or table that contains a string with a symbol, for example, 'databricks+'. Whenever I try to retrieve this data, I encounter a syntax error. Can anyone help me resolve this issue?

  • 12 kudos
13 More Replies
seefoods
by New Contributor III
  • 1048 Views
  • 2 replies
  • 0 kudos

cluster metrics collection

Hello @Debayan please how can i collect metrics provided by clusters metrics for databricks runtime 13.1 or latest using shell bash script. Cordially, Aubert EMAKO

  • 1048 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Cluster metrics is an UI tool and available in the UI only.  For reference:  https://docs.databricks.com/en/compute/cluster-metrics.html

  • 0 kudos
1 More Replies
chari
by Contributor
  • 6825 Views
  • 2 replies
  • 0 kudos

writing spark dataframe as CSV to a repo

Hi,I wrote a spark dataframe as csv to a repo (synced with github). But when I checked the folder, the file wasn't there. Here is my code:spark_df.write.format('csv').option('header','true').mode('overwrite').save('/Repos/abcd/mno/data') No error mes...

  • 6825 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 the folder 'Repos' is not your repo, it's `dbfs:/Repos`, please checkdbutils.fs.ls('/Repos/abcd/mno/data') 

  • 0 kudos
1 More Replies
Salman1
by New Contributor
  • 1025 Views
  • 0 replies
  • 0 kudos

Cannot find UDF on subsequent job runs on same cluster.

Hello, I am trying to run jobs with a JAR task type using databricks on AWS on an all-purpose cluster. The issue I'm facing is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluste...

  • 1025 Views
  • 0 replies
  • 0 kudos
chari
by Contributor
  • 3285 Views
  • 2 replies
  • 0 kudos

Fatal error when writing a big pandas dF

Hello DB community,I was trying to write a pandas dataframe containing 100000 rows as excel. Moments in the execution I received a fatal error : "Python kernel is unresponsive."However, I am constrained from increasing the number of clusters or other...

Data Engineering
Databricks
excel
python
  • 3285 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @chari ,Thanks for bringing up your concerns, always happy to help  We understand that you are facing the following error while you are writing a pandas dataframe containing 100000rows in excel. As per the Error >>> Fatal error: The Python kernel ...

  • 0 kudos
1 More Replies
Yaacoub
by New Contributor
  • 9188 Views
  • 2 replies
  • 1 kudos

[UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs

In my project I defined a UDF: @udf(returnType=IntegerType()) def ends_with_one(value, bit_position): if bit_position + len(value) < 0: return 0 else: return int(value[bit_position] == '1') spark.udf.register("ends_with_one"...

  • 9188 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Yaacoub, Just a friendly follow-up. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

  • 1 kudos
1 More Replies
abelian-grape
by New Contributor II
  • 7560 Views
  • 4 replies
  • 0 kudos

Intermittent error databricks job kept running

Hi i have the following error, but the job kept running, is that normal?{     "message": "The service at /api/2.0/jobs/runs/get?run_id=899157004942769 is temporarily unavailable. Please try again later. [TraceId: -]",     "error_code": "TEMPORARILY_U...

  • 7560 Views
  • 4 replies
  • 0 kudos
Latest Reply
abelian-grape
New Contributor II
  • 0 kudos

@Ayushi_Suthar also when ever it happens the job status does not change to "failed". But it keeps running. Is that normal?

  • 0 kudos
3 More Replies
joao_vnb
by New Contributor III
  • 50308 Views
  • 7 replies
  • 11 kudos

Resolved! Automate the Databricks workflow deployment

Hi everyone,Do you guys know if it's possible to automate the Databricks workflow deployment through azure devops (like what we do with the deployment of notebooks)?

  • 50308 Views
  • 7 replies
  • 11 kudos
Latest Reply
asingamaneni
New Contributor II
  • 11 kudos

Did you get a chance to try Brickflows - https://github.com/Nike-Inc/brickflowYou can find the documentation here - https://engineering.nike.com/brickflow/v0.11.2/Brickflow uses - Databricks Asset Bundles(DAB) under the hood but provides a Pythonic w...

  • 11 kudos
6 More Replies
isaac_gritz
by Databricks Employee
  • 8023 Views
  • 1 replies
  • 2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

  • 8023 Views
  • 1 replies
  • 2 kudos
Latest Reply
prasad95
New Contributor III
  • 2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

  • 2 kudos
DatBoi
by Contributor
  • 5290 Views
  • 2 replies
  • 1 kudos

Resolved! What happens to table created with CTAS statement when data in source table has changed

Hey all - I am sure this has been documented / answered before but what happens to a table created with a CTAS statement when data in the source table has changed? Does the sink table reflect the changes? Or is the data stored when the table is defin...

  • 5290 Views
  • 2 replies
  • 1 kudos
Latest Reply
SergeRielau
Databricks Employee
  • 1 kudos

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.The new table retains no memory on how it came to be.Therefore it will be oblivious to changes in the source.Views, as you say, stored queries, no data is persisted. And therefore the query...

  • 1 kudos
1 More Replies
Dhruv-22
by New Contributor III
  • 9665 Views
  • 4 replies
  • 1 kudos

Resolved! Managed table overwrites existing location for delta but not for oth

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 9665 Views
  • 4 replies
  • 1 kudos
Latest Reply
Red_blue_green
New Contributor III
  • 1 kudos

Hi,this is how the delta format work. With overwrite you are not deleting the files in the folder or replacing them. Delta is creating a new file with the overwritten schema and data. This way you are also able to return to former versions of the del...

  • 1 kudos
3 More Replies
sanjay
by Valued Contributor II
  • 12257 Views
  • 1 replies
  • 0 kudos

pyspark dropDuplicates performance issue

Hi,I am trying to delete duplicate records found by key but its very slow.  Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...

  • 12257 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi,I am trying to delete duplicate records found by key but its very slow.  Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Accn
by New Contributor
  • 1160 Views
  • 1 replies
  • 0 kudos

Dashboard from Notebook - How to schedule

notebook is created with insight and have created dashboard (Not a SQL) from it.Need to schedule this. I have tried scheduling by workflow - it only takes you to the notebookeven the schedule from dashboard takes me to the notebook and not the dashbo...

  • 1160 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Accn , Thanks for bringing up your concerns, always happy to help  We understand your concern but right now there is only way to refresh a notebook dashboard is via scheduled jobs. To schedule a dashboard to refresh at a specified interval, click...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels