cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dcardenas
by New Contributor
  • 814 Views
  • 0 replies
  • 0 kudos

Retrieving Logs with Job API Get-outputs service

Hello,I would like to retrieve the logs of some job that where launched using the Job Rest Api 2.0. I see in the doc that can be done with the service get-ouputs, however each time I check the service I just get the metadata part of the response but ...

  • 814 Views
  • 0 replies
  • 0 kudos
ken2
by New Contributor II
  • 2662 Views
  • 3 replies
  • 0 kudos

How to convert entity_id to notebook name or job

Hi, Databricks developers!I use system.access.table_lineage refering to this page.It's difficult for us to recognize which notebook was indicated by the entity_id.How do I get the table to convert entity_ids to Job names or Notebook names?

  • 2662 Views
  • 3 replies
  • 0 kudos
Latest Reply
mlamairesse
Databricks Employee
  • 0 kudos

Workflows system tables are coming very soon. 

  • 0 kudos
2 More Replies
cg3
by New Contributor
  • 915 Views
  • 0 replies
  • 0 kudos

Define VIEW in Databricks Asset Bundles?

Is it possible to define a Unity Catalog VIEW in a Databricks Asset Bundle, or specify in the bundle that a specific notebook gets run once per deployment?

  • 915 Views
  • 0 replies
  • 0 kudos
Kishan1003
by New Contributor
  • 3595 Views
  • 1 replies
  • 0 kudos

Merge Operation is very slow for S/4 Table ACDOCA

Hello,we have a scenario in Databricks where every day  we get 60-70 million records  and it takes a lot of time to merge the data into 28 billion records which is already sitting there . The time taken to rewrite the files which are affected is too ...

  • 3595 Views
  • 1 replies
  • 0 kudos
Latest Reply
177991
New Contributor II
  • 0 kudos

Hi @Kishan1003  did you find something helpful? Im dealing with a similar situation, acdoca table on my side is around 300M (fairly smaller), and incoming daily data is usually around 1M. I have try partition using period, like fiscyearper column, zo...

  • 0 kudos
costi9992
by New Contributor III
  • 6326 Views
  • 6 replies
  • 0 kudos

Resolved! Add policy init_scripts.*.volumes.destination for dlt not working

Hi,I tried to create a policy to use it for DLTs that are ran with shared clusters, but when i run the DLT with this policy I have an error. Init-script is added to Allowed JARs/Init Scripts.DLT events error: Cluster scoped init script /Volumes/main/...

  • 6326 Views
  • 6 replies
  • 0 kudos
Latest Reply
ayush007
New Contributor II
  • 0 kudos

@costi9992I am facing same issue with UC enabled cluster with 13.3 Databricks Runtime.I have uploaded the init shell script in Volume with particular init script allowed by metastore admin.But I get the same error as you stated .When I looked in clus...

  • 0 kudos
5 More Replies
shivam-singh
by New Contributor
  • 1299 Views
  • 1 replies
  • 0 kudos

Databricks-Autoloader-S3-KMS

Hi, I am working on a requirement where I am using autoloader in a DLT pipeline to ingest new files as they come.This flow is working fine. However I am facing an issue, when we have the source bucket an s3 location, since the bucket is having a SSE-...

  • 1299 Views
  • 1 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Can you please paste the exact errors and check below things:check following if its related to KMS:1. IAM role policy and KMS policy should have allow permissions2. Did you use extraConfig while mounting the source-s3 bucket:If you have used IAM role...

  • 0 kudos
esalohs
by New Contributor III
  • 9943 Views
  • 6 replies
  • 4 kudos

Databricks Autoloader - list only new files in an s3 bucket/directory

I have an s3 bucket with a couple of subdirectories/partitions like s3a://Bucket/dir1/ and s3a://Bucket/dir2/. There is currently in the millions of files sitting in bucket in the various subdirectories/partitions. I'm getting new data in near real t...

  • 9943 Views
  • 6 replies
  • 4 kudos
Latest Reply
kulkpd
Contributor
  • 4 kudos

below option used while performing spark.readStream:::.option('cloudFiles.format', 'json').option('cloudFiles.inferColumnTypes', 'true').option('cloudFiles.schemaEvolutionMode', 'rescue').option('cloudFiles.useNotifications', True).option('skipChange...

  • 4 kudos
5 More Replies
Muhammed
by New Contributor III
  • 27737 Views
  • 13 replies
  • 0 kudos

Filtering files for query

Hi Team,While writing my data to datalake table I am getting 'filtering files for query', it would be stuck at writingHow can I resolve this issue

  • 27737 Views
  • 13 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

My bad, somewhere in the screenshot I saw that but not able to find it now.Which source you are using to load the data, delta table, aws-s3, or azure-storage?

  • 0 kudos
12 More Replies
geetha_venkates
by New Contributor II
  • 12406 Views
  • 7 replies
  • 2 kudos

Resolved! How do we add a certificate file in Databricks for sparksubmit type of job?

How do we add a certificate file in Databricks for sparksubmit type of job? 

  • 12406 Views
  • 7 replies
  • 2 kudos
Latest Reply
nicozambelli
New Contributor II
  • 2 kudos

I have the same problem... when i worked with the hive_metastore in past, i was able tu use file system and also use API certs.Now i'm using the unity catalog and i can't upload a certificate, can somebody help me?

  • 2 kudos
6 More Replies
RobinK
by Contributor
  • 17717 Views
  • 5 replies
  • 6 kudos

Resolved! How to set Python rootpath when deploying with DABs

We have structured our code according to the documentation (notebooks-best-practices). We use Jupyter notebooks and have outsourced logic to Python modules. Unfortunately, the example described in the documentation only works if you have checked out ...

  • 17717 Views
  • 5 replies
  • 6 kudos
Latest Reply
Corbin
Databricks Employee
  • 6 kudos

Hello Robin, You’ll have to either use wheel files to package your libs and use those (see docs here), to make imports work out of the box. Otherwise, your entry point file needs to add the bundle root directory (or whatever the lib directory is) to ...

  • 6 kudos
4 More Replies
Kumarashokjmu
by New Contributor II
  • 5304 Views
  • 4 replies
  • 0 kudos

need to ingest millions of csv files from aws s3

I have a need to ingest millions of csv files from aws s3 bucket. I am facing issue with aws s3 throttling issue and besides notebook process is running for 8 hours plus and sometimes failing. When looking at cluster performance, it is utilized 60%.I...

  • 5304 Views
  • 4 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

If you want to load all the data at once use autoloader or DLT pipeline with directory listing if files are lexically ordered. ORIf you want to perform incremental load, divide the load into two job like historic data load vs live data load:Live data...

  • 0 kudos
3 More Replies
leelee3000
by Databricks Employee
  • 1162 Views
  • 0 replies
  • 0 kudos

Dynamic Filtering Criteria for Data Streaming

One of the potential uses for DLT is a scenario where I have a large input stream of data and need to create multiple smaller streams based on dynamic and adjustable filtering criteria. The challenge is to allow non-engineering individuals to adjust ...

  • 1162 Views
  • 0 replies
  • 0 kudos
leelee3000
by Databricks Employee
  • 1716 Views
  • 0 replies
  • 0 kudos

Parameterizing DLT Jobs

I have observed the use of advanced configuration and creating a map as a way to parameterize notebooks, but these appear to be cluster-wide settings. Is there a recommended best practice for directly passing parameters to notebooks running on a DLT ...

  • 1716 Views
  • 0 replies
  • 0 kudos
Geoff
by New Contributor II
  • 1783 Views
  • 0 replies
  • 1 kudos

Bizarre Delta Tables pipeline error: ModuleNotFound

I received the following error when trying to import a function defined in a .py file into a .ipynb file. I would add code blocks, but the message keeps getting rejected for invalid HTML.# test_lib.py (same directory, in a subfolder)def square(x):ret...

  • 1783 Views
  • 0 replies
  • 1 kudos
pankz-104
by New Contributor
  • 1962 Views
  • 1 replies
  • 0 kudos

how to read deleted files in adls

We have soft delete enabled in adls for 3 days, And we have manually deleted some checkpoint files size 3 tb approx. Each file is just couple of bytes like 30 b, 40 b. The deleted file size is increasing day by day even after couple of days. Suppose ...

  • 1962 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @pankz-104 , Just a friendly follow-up. Did you have time to test Kaniz's recommendations? do you still have issues? please let us know

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels