cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

superanna
by New Contributor II
  • 658 Views
  • 1 replies
  • 1 kudos

Yes, still illegal. And I also don’t understand why it is equated with drugs, but alcohol is not! Not a single murder has yet been committed under can...

Yes, still illegal. And I also don’t understand why it is equated with drugs, but alcohol is not! Not a single murder has yet been committed under cannabis, not a single war has been unleashed. It's just that people who don't use don't understand how...

  • 658 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mz_Yvette
New Contributor II
  • 1 kudos

You are absolutely right! I have found it to be a big relief medically. I have nerve conditions which is not operable. The legal medical pills almost literally killed me, and if it wasn't for my husband's quick thinking, I wouldn't be here to share t...

  • 1 kudos
Priyag1
by Honored Contributor II
  • 2545 Views
  • 4 replies
  • 4 kudos

Data preparation in Databricks

Data preparation in Databricks Good data is important to ensure accurate and useful results. To get good data following tasks must be done Cleaning and formatting data - Handling missing values or outliers, ensuring data is in the correct format, and...

  • 2545 Views
  • 4 replies
  • 4 kudos
Latest Reply
dplante
Contributor II
  • 4 kudos

Data governance and data lineage are other things to call out.Here's a cheat sheet  that is also useful -> Data Preparation Cheatsheet

  • 4 kudos
3 More Replies
gpierard
by New Contributor III
  • 542 Views
  • 1 replies
  • 0 kudos

Badge not received for Databricks Certified Data Engineer Associate

Hello,I passed the certification but haven't received a badge. In fact, I created my databricks academy account only after completing the test. Could you please ensure I do receive that certification? Thanks 

  • 542 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @gpierard  Thank you for reaching out!  Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
matty_f
by New Contributor II
  • 4336 Views
  • 1 replies
  • 0 kudos

Migration scripts for distribution, embedded in library

I'm working on a python package that can be installed via pip. The package will manage a delta table for the user, and new versions of the package may need to run migrations on this tableIs this an okay format to use?def migrate(table_path): mm_p...

  • 4336 Views
  • 1 replies
  • 0 kudos
Latest Reply
matty_f
New Contributor II
  • 0 kudos

Not much community happening here 

  • 0 kudos
Dekova
by New Contributor II
  • 552 Views
  • 0 replies
  • 0 kudos

Structured Streaming & Workspace Job Limits

In "Advanced Data Engineering with Databricks", the section on Bronze Ingestion Patterns mentions that workspaces have limits of 5000 jobs triggered in an hour. As a solution, it suggest multiplex streaming to a single bronze table and then using sub...

Screenshot 2023-07-14 at 9.49.43 PM.png
Data Engineering
structured streaming
  • 552 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 3428 Views
  • 2 replies
  • 3 kudos

Resolved! Can I install notebook scoped JAR/Maven libraries?

The notebook scoped libraries are very handy. Is it possible to leverage the same for maven jars or application jars as well?

  • 3428 Views
  • 2 replies
  • 3 kudos
Latest Reply
Pratik_Ghosh
New Contributor II
  • 3 kudos

Any further update on this topic?

  • 3 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 511 Views
  • 0 replies
  • 0 kudos

Schema definition help in scala notebook in databricks !!!!!!!1

I am building schema for an incoming avro file(json message) and creating a final dataframe for it. The schema built looks fine as per the json sample message provided but I am getting null values in all the fields. Can somebody look at this code and...

  • 511 Views
  • 0 replies
  • 0 kudos
erigaud
by Honored Contributor
  • 9244 Views
  • 5 replies
  • 3 kudos

Resolved! Gracefully stop a job based on condition

Hello, I have a job with many tasks running on a schedule, and the first tasks checks a condition. Based on the condition, I would either want to continue the job as normal, or to stop right away and don't run all the other tasks. Is there a way to d...

  • 9244 Views
  • 5 replies
  • 3 kudos
Latest Reply
erigaud
Honored Contributor
  • 3 kudos

I think the best way to accomplish this would be to either propagate the check, as mentionned by @menotron, or have the initial task in another job, and only run the second job if the condition is met. Obviously it depends on the use case. Thank you ...

  • 3 kudos
4 More Replies
Ria
by New Contributor
  • 2121 Views
  • 4 replies
  • 0 kudos

How to build master workflow for all the jobs present in workflow using databricks?

Suppose there are multiple job have been created using databricks workflow, now the requirement is to make one master workflow to trigger all the workflow depending on different condition like: some are supposed to trigger on daily basis, some on mon...

  • 2121 Views
  • 4 replies
  • 0 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 0 kudos

@Ria Hi , This feature was in development when I attended last Quarter Roadmap and I thought it is available in the latest versions or could be even in Private Preview. You can check with your Databricks Solution Architect. Even if not now, could be ...

  • 0 kudos
3 More Replies
s041507
by New Contributor
  • 502 Views
  • 0 replies
  • 0 kudos

Autoloader cannot load files from Repos with runtime 13.0+

Since runtime 13.0+ it is not possible anymore to reference Repos files with Autoloader using the "file:" prefix, e.g. "file:/Workspace/Repos/...". This was working before, but now Autoloader throws an error:com.databricks.sql.cloudfiles.errors.Cloud...

Data Engineering
autoloader
Repos
  • 502 Views
  • 0 replies
  • 0 kudos
Magnus
by Contributor
  • 1974 Views
  • 3 replies
  • 1 kudos

Auto Loader fails when reading json element containing space

I'm using Auto Loader as part of a Delta Live Tables pipeline to ingest json files, and today it failed with this error message:om.databricks.sql.transaction.tahoe.DeltaAnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column ...

Data Engineering
Auto Loader
Delta Live Tables
  • 1974 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 1 kudos

@Magnus You can read the input file using Pandas or Koalas (https://koalas.readthedocs.io/en/latest/index.html)then rename the columnsthen convert the Pandas/Koalas dataframe to Spark dataframe. You can write it back with the correct column name, so ...

  • 1 kudos
2 More Replies
207474
by New Contributor
  • 1400 Views
  • 3 replies
  • 2 kudos

How do I get the total number of queries run per day on a databricks SQL warehouse/endpoint?

I am trying to access the API: GET https://<databricks-instance>.cloud.databricks.com/api/2.0/sql/history/queries

  • 1400 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hi there @Sravan Burla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 2 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 755 Views
  • 0 replies
  • 0 kudos

Delta Live Tables Example Questions

I am testing with some examples of Delta Live Tables from https://github.com/databricks/delta-live-tables-notebooks/tree/main/divvy-bike-demoI have ran all the relevant files of ingestion:python-weatherinfo-api-ingest.pypython-divvybike-api-ingest-st...

THIAM_HUATTAN_0-1689301274037.png
  • 755 Views
  • 0 replies
  • 0 kudos
Nick_Hughes
by New Contributor III
  • 4971 Views
  • 3 replies
  • 1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

  • 4971 Views
  • 3 replies
  • 1 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 1 kudos

Hi @Nick_Hughes This may be late for your scenario - but hopefully others facing similar issues will find it useful.You can specify how data is generated in `dbldatagen` using rules in the data generation spec. If rules are specified for data generat...

  • 1 kudos
2 More Replies
Raghav2
by New Contributor
  • 5670 Views
  • 1 replies
  • 0 kudos

AnalysisException: [COLUMN_ALREADY_EXISTS] The column `<col>` already exists. Consider to choose an

Hey Guys,          I'm facing this exception while trying to read public s3 bucket "Analysis Exception: [COLUMN_ALREADY_EXISTS] The column `<column name>` already exists. Consider to choose another name or rename the existing column.",also thing is I...

  • 5670 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

You can use dbutils to read the file.%fshead <s3 path>

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels