cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

chari
by Contributor
  • 6281 Views
  • 2 replies
  • 0 kudos

writing spark dataframe as CSV to a repo

Hi,I wrote a spark dataframe as csv to a repo (synced with github). But when I checked the folder, the file wasn't there. Here is my code:spark_df.write.format('csv').option('header','true').mode('overwrite').save('/Repos/abcd/mno/data') No error mes...

  • 6281 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 the folder 'Repos' is not your repo, it's `dbfs:/Repos`, please checkdbutils.fs.ls('/Repos/abcd/mno/data') 

  • 0 kudos
1 More Replies
Salman1
by New Contributor
  • 953 Views
  • 0 replies
  • 0 kudos

Cannot find UDF on subsequent job runs on same cluster.

Hello, I am trying to run jobs with a JAR task type using databricks on AWS on an all-purpose cluster. The issue I'm facing is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluste...

  • 953 Views
  • 0 replies
  • 0 kudos
chari
by Contributor
  • 3023 Views
  • 2 replies
  • 0 kudos

Fatal error when writing a big pandas dF

Hello DB community,I was trying to write a pandas dataframe containing 100000 rows as excel. Moments in the execution I received a fatal error : "Python kernel is unresponsive."However, I am constrained from increasing the number of clusters or other...

Data Engineering
Databricks
excel
python
  • 3023 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @chari ,Thanks for bringing up your concerns, always happy to help  We understand that you are facing the following error while you are writing a pandas dataframe containing 100000rows in excel. As per the Error >>> Fatal error: The Python kernel ...

  • 0 kudos
1 More Replies
Yaacoub
by New Contributor
  • 8980 Views
  • 2 replies
  • 1 kudos

[UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs

In my project I defined a UDF: @udf(returnType=IntegerType()) def ends_with_one(value, bit_position): if bit_position + len(value) < 0: return 0 else: return int(value[bit_position] == '1') spark.udf.register("ends_with_one"...

  • 8980 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Yaacoub, Just a friendly follow-up. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

  • 1 kudos
1 More Replies
abelian-grape
by New Contributor II
  • 7384 Views
  • 4 replies
  • 0 kudos

Intermittent error databricks job kept running

Hi i have the following error, but the job kept running, is that normal?{     "message": "The service at /api/2.0/jobs/runs/get?run_id=899157004942769 is temporarily unavailable. Please try again later. [TraceId: -]",     "error_code": "TEMPORARILY_U...

  • 7384 Views
  • 4 replies
  • 0 kudos
Latest Reply
abelian-grape
New Contributor II
  • 0 kudos

@Ayushi_Suthar also when ever it happens the job status does not change to "failed". But it keeps running. Is that normal?

  • 0 kudos
3 More Replies
joao_vnb
by New Contributor III
  • 45802 Views
  • 7 replies
  • 11 kudos

Resolved! Automate the Databricks workflow deployment

Hi everyone,Do you guys know if it's possible to automate the Databricks workflow deployment through azure devops (like what we do with the deployment of notebooks)?

  • 45802 Views
  • 7 replies
  • 11 kudos
Latest Reply
asingamaneni
New Contributor II
  • 11 kudos

Did you get a chance to try Brickflows - https://github.com/Nike-Inc/brickflowYou can find the documentation here - https://engineering.nike.com/brickflow/v0.11.2/Brickflow uses - Databricks Asset Bundles(DAB) under the hood but provides a Pythonic w...

  • 11 kudos
6 More Replies
isaac_gritz
by Databricks Employee
  • 7824 Views
  • 1 replies
  • 2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

  • 7824 Views
  • 1 replies
  • 2 kudos
Latest Reply
prasad95
New Contributor III
  • 2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

  • 2 kudos
DatBoi
by Contributor
  • 4584 Views
  • 2 replies
  • 1 kudos

Resolved! What happens to table created with CTAS statement when data in source table has changed

Hey all - I am sure this has been documented / answered before but what happens to a table created with a CTAS statement when data in the source table has changed? Does the sink table reflect the changes? Or is the data stored when the table is defin...

  • 4584 Views
  • 2 replies
  • 1 kudos
Latest Reply
SergeRielau
Databricks Employee
  • 1 kudos

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.The new table retains no memory on how it came to be.Therefore it will be oblivious to changes in the source.Views, as you say, stored queries, no data is persisted. And therefore the query...

  • 1 kudos
1 More Replies
Dhruv-22
by New Contributor III
  • 9025 Views
  • 4 replies
  • 1 kudos

Resolved! Managed table overwrites existing location for delta but not for oth

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 9025 Views
  • 4 replies
  • 1 kudos
Latest Reply
Red_blue_green
New Contributor III
  • 1 kudos

Hi,this is how the delta format work. With overwrite you are not deleting the files in the folder or replacing them. Delta is creating a new file with the overwritten schema and data. This way you are also able to return to former versions of the del...

  • 1 kudos
3 More Replies
sanjay
by Valued Contributor II
  • 11801 Views
  • 1 replies
  • 0 kudos

pyspark dropDuplicates performance issue

Hi,I am trying to delete duplicate records found by key but its very slow.  Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...

  • 11801 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi,I am trying to delete duplicate records found by key but its very slow.  Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Accn
by New Contributor
  • 1048 Views
  • 1 replies
  • 0 kudos

Dashboard from Notebook - How to schedule

notebook is created with insight and have created dashboard (Not a SQL) from it.Need to schedule this. I have tried scheduling by workflow - it only takes you to the notebookeven the schedule from dashboard takes me to the notebook and not the dashbo...

  • 1048 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Accn , Thanks for bringing up your concerns, always happy to help  We understand your concern but right now there is only way to refresh a notebook dashboard is via scheduled jobs. To schedule a dashboard to refresh at a specified interval, click...

  • 0 kudos
chrisf_sts
by New Contributor II
  • 7789 Views
  • 1 replies
  • 1 kudos

Resolved! After moving mounted s3 bucket under unity catalog control, python file paths no longer work

I have been using a mounted external s3 bucket with json files up until a few days ago, when my company changed to using all file mounts under control of the unity catalog.  Suddenly I can no loner run a command like:with open("/mnt/my_files/my_json....

  • 7789 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 1 kudos

Hi @chrisf_sts ,Thanks for bringing up your concerns, always happy to help  May I know which cluster access mode you are using to run the notebook commands? Can you please try to run this below command on Single user cluster access mode.  "with open(...

  • 1 kudos
Miasu
by New Contributor II
  • 1019 Views
  • 0 replies
  • 0 kudos

Unable to analyze external table | FileAlreadyExistsException

Hello experts, There's a csv file, "nyc_taxi.csv" saved under users/myfolder on DBFS, and I used this file created 2 tables:1. nyc_taxi : created using the UI, and it appeared as a managed table saved under dbfs:/user/hive/warehouse/mydatabase.db/nyc...

  • 1019 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels